Search CORE

Chalmers Research

A Principled Approach to Analyze Expressiveness and Accuracy of Graph Neural Networks

Author: A Srinivasan
B Weisfeiler
BD Mckay
F Scarselli
K Riesen
KM Borgwardt
M Schlichtkrull
N Wale
PD Dobson
R Collobert
S Min
Publication venue: HAL CCSD
Publication date: 21/11/2019
Field of study

Graph neural networks (GNNs) have known an increasing success recently, with many GNN variants achieving state-of-the-art results on node and graph classification tasks. The proposed GNNs, however, often implement complex node and graph embedding schemes, which makes challenging to explain their performance. In this paper, we investigate the link between a GNN's expressiveness, that is, its ability to map different graphs to different representations, and its generalization performance in a graph classification setting. In particular , we propose a principled experimental procedure where we (i) define a practical measure for expressiveness, (ii) introduce an expressiveness-based loss function that we use to train a simple yet practical GNN that is permutation-invariant, (iii) illustrate our procedure on benchmark graph classification problems and on an original real-world application. Our results reveal that expressiveness alone does not guarantee a better performance, and that a powerful GNN should be able to produce graph representations that are well separated with respect to the class of the corresponding graphs

HAL - UPEC / UPEM

A quantitative approach to study indirect effects among disease proteins in the human protein interaction network

Author: A Bairoch
A Hamosh
BJ Mayer
CB Müller
DJ Smyth
DP Ryan
DP Wall
DW Huang
E Avizienyte
E Estrada
EJ Adie
F Jordàn
Ferenc Jordán
G Dennis Jr
H Jeong
H Yu
HJ Tsai
J Goñi
J Xu
JV Olsen
JY Chen
K Lage
KI Goh
KM Borgwardt
KM Small
KR Brown
L Sam
M Krauthammer
MG Kann
NCBI
R Valentini
S Wassermann
T Ideker
T Pawson
Thanh-Phuong Nguyen
TP Nguyen
V Large
WC Liu
Z Dezsõ
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Systems biology makes it possible to study larger and more intricate systems than before, so it is now possible to look at the molecular basis of several diseases in parallel. Analyzing the interaction network of proteins in the cell can be the key to understand how complex processes lead to diseases. Novel tools in network analysis provide the possibility to quantify the key interacting proteins in large networks as well as proteins that connect them. Here we suggest a new method to study the relationships between topology and functionality of the protein-protein interaction network, by identifying key mediator proteins possibly maintaining indirect relationships among proteins causing various diseases. Results Based on the i2d and OMIM databases, we have constructed (i) a network of proteins causing five selected diseases (DP, disease proteins) plus their interacting partners (IP, non-disease proteins), the DPIP network and (ii) a protein network showing only these IPs and their interactions, the IP network. The five investigated diseases were (1) various cancers, (2) heart diseases, (3) obesity, (4) diabetes and (5) autism. We have quantified the number and strength of IP-mediated indirect effects between the five groups of disease proteins and hypothetically identified the most important mediator proteins linking heart disease to obesity or diabetes in the IP network. The results present the relationship between mediator role and centrality, as well as between mediator role and functional properties of these proteins. Conclusions We show that a protein which plays an important indirect mediator role between two diseases is not necessarily a hub in the PPI network. This may suggest that, even if hub proteins and disease proteins are trivially of great interest, mediators may also deserve more attention, especially if disease-disease associations are to be understood. Identifying the hubs may not be sufficient to understand particular pathways. We have found that the mediators between heart diseases and obesity, as well as heart diseases and diabetes are of relatively high functional importance in the cell. The mediator proteins suggested here should be experimentally tested as products of hypothetical disease-related proteins.</p

Springer - Publisher Connector

Open Repository and Bibliography - Luxembourg

The Born supremacy: quantum advantage and training of an Ising Born machine

Author: A Ramdas
AL Gibbs
AW Harrow
D Hangleiter
EJ Nyström
F Arute
J Preskill
J Zeng
J-G Liu
K Fujii
K Mitarai
KE Hamilton
KM Borgwardt
M Benedetti
M Benedetti
M Schuld
ME Maron
MH Amin
MJ Bremner
MJ Bremner
MJ Bremner
O Goldreich
P Shor
P-L Dallaire-Demers
RM Dudley
S Aaronson
S Arunachalam
S Boixo
S Cheng
S Khatri
S Lloyd
V Havlíček
X Gao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/06/2019
Field of study

The search for an application of near-term quantum devices is widespread. Quantum Machine Learning is touted as a potential utilisation of such devices, particularly those which are out of the reach of the simulation capabilities of classical computers. In this work, we propose a generative Quantum Machine Learning Model, called the Ising Born Machine (IBM), which we show cannot, in the worst case, and up to suitable notions of error, be simulated efficiently by a classical device. We also show this holds for all the circuit families encountered during training. In particular, we explore quantum circuit learning using non-universal circuits derived from Ising Model Hamiltonians, which are implementable on near term quantum devices. We propose two novel training methods for the IBM by utilising the Stein Discrepancy and the Sinkhorn Divergence cost functions. We show numerically, both using a simulator within Rigetti's Forest platform and on the Aspen-1 16Q chip, that the cost functions we suggest outperform the more commonly used Maximum Mean Discrepancy (MMD) for differentiable training. We also propose an improvement to the MMD by proposing a novel utilisation of quantum kernels which we demonstrate provides improvements over its classical counterpart. We discuss the potential of these methods to learn `hard' quantum distributions, a feat which would demonstrate the advantage of quantum over classical computers, and provide the first formal definitions for what we call `Quantum Learning Supremacy'. Finally, we propose a novel view on the area of quantum circuit compilation by using the IBM to `mimic' target quantum circuits using classical output data only.Comment: v3 : Close to journal published version - significant text structure change, split into main text & appendices. See v2 for unsplit version; v2 : Typos corrected, figures altered slightly; v1 : 68 pages, 39 Figures. Comments welcome. Implementation at https://github.com/BrianCoyle/IsingBornMachin

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Green tea extract enhances parieto-frontal connectivity during working memory processing

arXiv.org e-Print Archive

Methods to study splicing from high-throughput RNA Sequencing data

Author: A Ameur
A Bhasi
A Dobin
A Mortazavi
A Oshlack
A Roberts
A Roberts
AM Mezlini
AN Brooks
B Jackson
B Kakaradov
B Langmead
B Li
B Li
BJ Haas
BJ Haas
C Trapnell
C Trapnell
C Trapnell
D Hiller
D Singh
DL Wood
DW Bryant
E Eyras
E Lee
E Turro
ET Wang
F Birzele
F Bona De
F Denoeud
F Tang
G Robertson
G Xu
GA Sacomoto
GR Grant
GS Slater
H Bao
H Jiang
H Jiang
H Kim
H Richard
J Behr
J Du
J Feng
J Hu
J Lovén
J Martin
J Salzman
J Seok
J Seok
J Wu
J Wu
JE Allen
JJ Li
JP Venables
K Schneeberger
K Wang
KD Hansen
KF Au
KL Howe
KM Borgwardt
L Chen
L Chen
L Wang
L Wang
LY Chen
M Aschoff
M Fiume
M Garber
M Griffith
M Guttman
M Stanke
M Stanke
M Sultan
MC Ryan
MF Rogers
MG Grabherr
MH Schulz
MT Dimon
N Cloonan
N Cloonan
N Deng
N Leng
N Nicolae
N Philippe
N Vijay
NA Fonseca
O Stegle
P Drewe
P Glaus
PL Martelli
PP Labaj
Q Liu
Q Liu
Q Pan
QY Zhao
R Bohnert
R Guigó
R Li
S Anders
S Djebali
S Filichkin
S Heber
S Huang
S Lee
S Mangul
S Marco-Sola
S Shen
S Sonnenburg
S Srivastava
S Tang
S Zheng
SB Montgomery
SH Nagaraj
SK Lou
T Bonfert
TA Clark
TD Wu
TD Wu
W Li
W Li
W Wang
WJ Kent
Y Hu
Y Katz
Y Li
Y Liao
Y Surget-Groba
Y Xing
Y Xing
Y Zhang
Z Xia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/07/2015
Field of study

The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

GENN: A GEneral Neural Network for Learning Tabulated Data with Examples from Protein Structure Prediction

Author: A Vazquez
AH Fielding
B Rost
B Xue
C Chothia
C Floudas
C Wang
E Durham
E Faraggi
E Faraggi
E Faraggi
E Faraggi
H Zhang
H Zhou
HR Maier
J Gao
J Makhoul
J Moult
J Moult
J Xu
J Zhang
JC Chang
KM Borgwardt
M Moret
NR Crick
O Dor
P Gniewek
PY Chou
RJ Fontenot
S Nunez
SM Kassin
T Blundell
T Zhang
T Zhang
T Zhang
W Schofield
Y Feng
Y Feng
Y Yang
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

We present a GEneral Neural Network (GENN) for learning trends from existing data and making predictions of unknown information. The main novelty of GENN is in its generality, simplicity of use, and its specific handling of windowed input/output. Its main strength is its efficient handling of the input data, enabling learning from large datasets. GENN is built on a two-layered neural network and has the option to use separate inputs–output pairs or window-based data using data structures to efficiently represent input–output pairs. The program was tested on predicting the accessible surface area of globular proteins, scoring proteins according to similarity to native, predicting protein disorder, and has performed remarkably well. In this paper we describe the program and its use. Specifically, we give as an example the construction of a similarity to native protein scoring function that was constructed using GENN. The source code and Linux executables for GENN are available from Research and Information Systems at http://mamiris.com and from the Battelle Center for Mathematical Medicine at http://mathmed.org. Bugs and problems with the GENN program should be reported to EF

IUPUIScholarWorks

Automatic prediction of catalytic residues by modeling residue structural neighborhood

Author: A Ceroni
A Humm
A Yamaguchi
AC Wallace
AE Todd
Andrea Passerini
CT Porter
E Chea
E Webb
E Youn
EF Pettersen
Elisa Cilia
G Amitai
G Bartlett
J Bernardes
J Davis
J Ebert
J Mistry
JA Capra
JC Nebel
JD Fischer
KM Borgwardt
L Xie
M Babor
M Lippi
M Ondrechen
MM Benning
N Cristianini
N Nagano
N Shu
NV Petrova
P Gherardini
RD Finn
S Kawashima
SF Altschul
T Joachims
T Zhang
W Tong
WS Valdar
Y Tang
Y Wei
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Prediction of catalytic residues is a major step in characterizing the function of enzymes. In its simpler formulation, the problem can be cast into a binary classification task at the residue level, by predicting whether the residue is directly involved in the catalytic process. The task is quite hard also when structural information is available, due to the rather wide range of roles a functional residue can play and to the large imbalance between the number of catalytic and non-catalytic residues.Results: We developed an effective representation of structural information by modeling spherical regions around candidate residues, and extracting statistics on the properties of their content such as physico-chemical properties, atomic density, flexibility, presence of water molecules. We trained an SVM classifier combining our features with sequence-based information and previously developed 3D features, and compared its performance with the most recent state-of-the-art approaches on different benchmark datasets. We further analyzed the discriminant power of the information provided by the presence of heterogens in the residue neighborhood.Conclusions: Our structure-based method achieves consistent improvements on all tested datasets over both sequence-based and structure-based state-of-the-art approaches. Structural neighborhood information is shown to be responsible for such results, and predicting the presence of nearby heterogens seems to be a promising direction for further improvements.Journal ArticleResearch Support, N.I.H. Extramuralinfo:eu-repo/semantics/publishe

Springer - Publisher Connector

Repository for Publications and Research Data

DI-fusion

Support Vector Machines and Kernels for Computational Biology

ISSN:1553-734XISSN:1553-735

Fraunhofer-ePrints

MPG.PuRe

Is EC class predictable from reaction mechanism?

Author: A Statnikov
AG McDonald
BV Dasarathy
BW Matthews
C Andreini
C Andreini
DARS Latino
DE Almonacid
DE Almonacid
GL Holliday
GL Holliday
GL Holliday
GL Holliday
GL Holliday
GL Holliday
GL Holliday
GL Holliday
IUBMB
J Gorodkin
J Menke
John BO Mitchell
JW Torrance
K Astikainen
KM Borgwardt
L Breiman
L De Ferrari
LD Hughes
M Aizerman
M Kanehisa
M Leber
N Furnham
N Nagano
N Nagano
Neetika Nath
NM O'Boyle
O Sacher
PC Babbitt
PD Dobson
R Lowe
RD Uriarte
SA Rahman
SCH Pegg
SCH Pegg
T Bray
T Bylander
V Egelhofer
VN Vapnik
WS Noble
X Hu
Y Yamanishi
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

We thank the Scottish Universities Life Sciences Alliance (SULSA) and the Scottish Overseas Research Student Awards Scheme of the Scottish Funding Council (SFC) for financial support.Background: We investigate the relationships between the EC (Enzyme Commission) class, the associated chemical reaction, and the reaction mechanism by building predictive models using Support Vector Machine (SVM), Random Forest (RF) and k-Nearest Neighbours (kNN). We consider two ways of encoding the reaction mechanism in descriptors, and also three approaches that encode only the overall chemical reaction. Both cross-validation and also an external test set are used. Results: The three descriptor sets encoding overall chemical transformation perform better than the two descriptions of mechanism. SVM and RF models perform comparably well; kNN is less successful. Oxidoreductases and hydrolases are relatively well predicted by all types of descriptor; isomerases are well predicted by overall reaction descriptors but not by mechanistic ones. Conclusions: Our results suggest that pairs of similar enzyme reactions tend to proceed by different mechanisms. Oxidoreductases, hydrolases, and to some extent isomerases and ligases, have clear chemical signatures, making them easier to predict than transferases and lyases. We find evidence that isomerases as a class are notably mechanistically diverse and that their one shared property, of substrate and product being isomers, can arise in various unrelated ways. The performance of the different machine learning algorithms is in line with many cheminformatics applications, with SVM and RF being roughly equally effective. kNN is less successful, given the role that non-local information plays in successful classification. We note also that, despite a lack of clarity in the literature, EC number prediction is not a single problem; the challenge of predicting protein function from available sequence data is quite different from assigning an EC classification from a cheminformatics representation of a reaction.Publisher PDFPeer reviewe

Springer - Publisher Connector